The increased importance of mobile photography created a need for fast and performant RAW image processing pipelines capable of producing good visual results in spite of the mobile camera sensor limitations. While deep learning-based approaches can efficiently solve this problem, their computational requirements usually remain too large for high-resolution on-device image processing. To address this limitation, we propose a novel PyNET-V2 Mobile CNN architecture designed specifically for edge devices, being able to process RAW 12MP photos directly on mobile phones under 1.5 second and producing high perceptual photo quality. To train and to evaluate the performance of the proposed solution, we use the real-world Fujifilm UltraISP dataset consisting on thousands of RAW-RGB image pairs captured with a professional medium-format 102MP Fujifilm camera and a popular Sony mobile camera sensor. The results demonstrate that the PyNET-V2 Mobile model can substantially surpass the quality of tradition ISP pipelines, while outperforming the previously introduced neural network-based solutions designed for fast image processing. Furthermore, we show that the proposed architecture is also compatible with the latest mobile AI accelerators such as NPUs or APUs that can be used to further reduce the latency of the model to as little as 0.5 second. The dataset, code and pre-trained models used in this paper are available on the project website: https://github.com/gmalivenko/PyNET-v2
translated by 谷歌翻译
Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
尽管最近从遮挡和嘈杂的面部图像中的3D面部重建的发展,但性能仍然不满意。主要挑战之一是在面部图像中处理中等至重闭塞。另外,面部图像中的噪声抑制了面部属性的正确捕获,从而需要可靠地解决。此外,大多数现有方法依赖于额外的依赖性,对培训过程构成了许多约束。因此,我们提出了一种自我监督的强制性指导(流氓)框架,以获得面部图像中的遮挡和噪声的鲁棒性。所提出的网络包含1)指导管线,用于获得清洁面的3D面系数,以及2)稳定流水线,以获取封闭或噪声图像的估计系数与清洁对应物之间的估计系数之间的一致性。所提出的图像和特征级损失功能有助于流氓学习过程而不会构成额外的依赖性。在Celeba的测试数据集的三种变化:理性闭塞,妄想闭塞和嘈杂的面部图像,我们的方法优于当前的最先进的方法(例如,基于形状的3D顶点错误,合理闭塞的0.146〜0.048的减少,从0.292〜0.061,妄想闭塞和面部图像中的噪声为0.269至0.053),展示了所提出的方法的有效性。
translated by 谷歌翻译
This paper expounds the design and control of a new Variable Stiffness Series Elastic Actuator (VSSEA). It is established by employing a modular mechanical design approach that allows us to effectively optimise the stiffness modulation characteristics and power density of the actuator. The proposed VSSEA possesses the following features: i) no limitation in the work-range of output link, ii) a wide range of stiffness modulation (~20Nm/rad to ~1KNm/rad), iii) low-energy-cost stiffness modulation at equilibrium and non-equilibrium positions, iv) compact design and high torque density (~36Nm/kg), and v) high-speed stiffness modulation (~3000Nm/rad/s). Such features can help boost the safety and performance of many advanced robotic systems, e.g., a cobot that physically interacts with unstructured environments and an exoskeleton that provides physical assistance to human users. These features can also enable us to utilise variable stiffness property to attain various regulation and trajectory tracking control tasks only by employing conventional controllers, eliminating the need for synthesising complex motion control systems in compliant actuation. To this end, it is experimentally demonstrated that the proposed VSSEA is capable of precisely tracking desired position and force control references through the use of conventional Proportional-Integral-Derivative (PID) controllers.
translated by 谷歌翻译
Object instance segmentation is a key challenge for indoor robots navigating cluttered environments with many small objects. Limitations in 3D sensing capabilities often make it difficult to detect every possible object. While deep learning approaches may be effective for this problem, manually annotating 3D data for supervised learning is time-consuming. In this work, we explore zero-shot instance segmentation (ZSIS) from RGB-D data to identify unseen objects in a semantic category-agnostic manner. We introduce a zero-shot split for Tabletop Objects Dataset (TOD-Z) to enable this study and present a method that uses annotated objects to learn the ``objectness'' of pixels and generalize to unseen object categories in cluttered indoor environments. Our method, SupeRGB-D, groups pixels into small patches based on geometric cues and learns to merge the patches in a deep agglomerative clustering fashion. SupeRGB-D outperforms existing baselines on unseen objects while achieving similar performance on seen objects. Additionally, it is extremely lightweight (0.4 MB memory requirement) and suitable for mobile and robotic applications. The dataset split and code will be made publicly available upon acceptance.
translated by 谷歌翻译
A self-supervised adaptive low-light video enhancement (SALVE) method is proposed in this work. SALVE first conducts an effective Retinex-based low-light image enhancement on a few key frames of an input low-light video. Next, it learns mappings from the low- to enhanced-light frames via Ridge regression. Finally, it uses these mappings to enhance the remaining frames in the input video. SALVE is a hybrid method that combines components from a traditional Retinex-based image enhancement method and a learning-based method. The former component leads to a robust solution which is easily adaptive to new real-world environments. The latter component offers a fast, computationally inexpensive and temporally consistent solution. We conduct extensive experiments to show the superior performance of SALVE. Our user study shows that 87% of participants prefer SALVE over prior work.
translated by 谷歌翻译
Hyperparameter tuning is critical to the success of federated learning applications. Unfortunately, appropriately selecting hyperparameters is challenging in federated networks. Issues of scale, privacy, and heterogeneity introduce noise in the tuning process and make it difficult to evaluate the performance of various hyperparameters. In this work, we perform the first systematic study on the effect of noisy evaluation in federated hyperparameter tuning. We first identify and rigorously explore key sources of noise, including client subsampling, data and systems heterogeneity, and data privacy. Surprisingly, our results indicate that even small amounts of noise can significantly impact tuning methods-reducing the performance of state-of-the-art approaches to that of naive baselines. To address noisy evaluation in such scenarios, we propose a simple and effective approach that leverages public proxy data to boost the evaluation signal. Our work establishes general challenges, baselines, and best practices for future work in federated hyperparameter tuning.
translated by 谷歌翻译
We present a simple approach which can turn a ViT encoder into an efficient video model, which can seamlessly work with both image and video inputs. By sparsely sampling the inputs, the model is able to do training and inference from both inputs. The model is easily scalable and can be adapted to large-scale pre-trained ViTs without requiring full finetuning. The model achieves SOTA results and the code will be open-sourced.
translated by 谷歌翻译
We propose a novel hematoxylin and eosin (H&E) stain normalization method based on a modified U-Net neural network architecture. Unlike previous deep-learning methods that were often based on generative adversarial networks (GANs), we take a teacher-student approach and use paired datasets generated by a trained CycleGAN to train a U-Net to perform the stain normalization task. Through experiments, we compared our method to two recent competing methods, CycleGAN and StainNet, a lightweight approach also based on the teacher-student model. We found that our method is faster and can process larger images with better quality compared to CycleGAN. We also compared to StainNet and found that our method delivered quantitatively and qualitatively better results.
translated by 谷歌翻译
安全是每个机器人平台的关键特性:任何控制政策始终遵守执行器限制,并避免与环境和人类发生冲突。在加强学习中,安全对于探索环境而不会造成任何损害更为基础。尽管有许多针对安全勘探问题的建议解决方案,但只有少数可以处理现实世界的复杂性。本文介绍了一种安全探索的新公式,用于强化各种机器人任务。我们的方法适用于广泛的机器人平台,即使在通过探索约束歧管的切线空间从数据中学到的复杂碰撞约束下也可以执行安全。我们提出的方法在模拟的高维和动态任务中实现了最先进的表现,同时避免与环境发生冲突。我们在Tiago ++机器人上展示了安全的现实部署,在操纵和人类机器人交互任务中取得了显着的性能。
translated by 谷歌翻译